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ABSTRACT 

Iteratively  Reweighted  Least  Squares  (IRLS)  is  a  computationally 
attractive  method  for  providing  estimated  regression  coefficients  that  are 
relatively  unaffected  by  extreme  observations.  Definitions  and  statistical 
justifications  are  reviewed,  and  a  numerical  example  and  a  multivariate 
extension  are  included.  This  article  is  to  be  an  entry  in  The  Encyclopedia  of 
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ITERATIVELY  REWEIGHTED  LEAST  SQUARES  -  ENCYCLOPEDIA  ENTRY 


Donald  B.  Rubin 

Iteratively  reweighted  least  squares  (IRLS)  refers  to  an  iterative 
procedure  for  estimating  regression  coefficients:  at  each  iteration,  weighted 
least  squares  computations  are  performed,  where  the  weights  change  from 
iteration  to  iteration.  Although  IRLS  has  been  used  to  estimate  coefficients 
in  nonlinear  and  logistic  regressions,  currently,  IRLS  tends  to  be  associated 
with  robust  regression. 

1 .  IRLS  for  Robust  Regression 

When  using  IRLS  for  robust  regression,  the  weights  are  functions  of  the 
residuals  from  the  previous  iteration  such  that  points  with  larger  residuals 
receive  relatively  less  weight  than  points  with  smaller  residuals. 
Consequently,  unusual  points  tend  to  receive  less  weight  than  typical 
points. 

IRLS  is  a  popular  technique  for  obtaining  estimated  regression 
coefficients  that  are  relatively  unaffected  by  extreme  observations.  One 
reason  for  the  popularity  of  IRLS  is  that  it  can  be  easily  implemented  using 
readily  available  least  squares  algorithms.  Another  reason  is  that  it  can  be 
motivated  from  sound  statistical  principles  (c.f.  [11,  [4],  19]).  A  third 
reason  for  its  popularity  is  that  some  experience  suggests  it  is  a  useful 
practical  tool  when  applied  to  real  data  (c.f.  [2J,  (71).  In  order  to  define 
precisely  IRLS  for  robust  regression,  some  notation  is  needed. 


Sponsored  by  the  United  States  Army  under  Contract  No.  DAAG29-80-C-0041 . 


2.  Weighted  Least  Squares  Computations 


Let  Y  be  an  n  *  1  data  matrix  of  n  observations  of  a  dependent 
variable,  let  X  be  the  associated  n  *  p  data  matrix  of  n  observations 
of  p  predictor  variables,  and  let  W  be  an  n  *  n  diagonal  matrix  of 
nonnegative  weights,  which  for  the  moment  we  assume  is  fixed.  Then  the 
weighted  least  squares  estimate  of  the  regression  coefficient  of  Y  on  X  is 
given,  as  a  function  of  W,  by 

b(W)  =  (XTWX)-1  (XTWY)  ,  (1) 


if  (XTWX)  has  rank  p  and  is  not  defined  otherwise. 

Theoretical  justification  for  the  estimator  b(w)  is  straightforward. 

Suppose  that  for  fixed  W,  the  conditional  distribution  of  Y  given  X  has 

mean  X8,  where  3  is  the  p  *  1  regression  coefficient  to  be  estimated, 
2-1  2 

and  variance  o  w  ,  where  a  is  the  residual  variance,  usually  also  to  be 

estimated.  By  noting  that,  for  fixed  W,  W^2  y  has  mean  W^2  X0  and  variance 
2 

a  I,  the  standard  Gauss-Markov  arguments  imply  that  b(W)  is  the  value  of 

T 

3  that  minimizes  the  residual  sum  of  squares  (Y  -  X3)  W(Y  -  X8)  as  well  as 

the  minimum  variance  unbiased  estimator  of  8.  If  the  conditional 

distribution  of  Y  given  X  is  normal  for  fixed  W,  then  b(W)  is  also  the 

maximum  likelihood  estimate  of  0,  and  the  associated  maximum  likelihood 
2 

estimate  of  a  is  the  weighted  sum  of  squared  residuals: 

s(W)2  =  (Y  -  X  b(W)]TW[Y  -  X  b(W) ]/n  .  (2) 


IRLS  is  used  when  the  weight  matrix  is  not  fixed.  Specifically,  IRLS 

U+1  ) 


applies  equation  (1)  to  obtain  b 


the  (i+1)st  iterate  of  the 


regression  coefficient,  from  the  weight  matrix  of  the  previous  iteration 

,U+1  ) 


b(WU)> 


(3) 


In  order  to  define  a  specific  version  of  IRLS,  we  thus  need  only  to  define  the 
(4  ) 

weight  matrix  W  . 
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3.  The  Weight  Matrix  and  Iterations  for  Robust  Regression 


For  Robust  regression,  the  ic  diagonal  element  in  thfc  weight  matrix 
W<*>,  is  a  function  w('',  of  the  i th  standardized  residual  obtained 


by  using  b  to  predict  Y^: 


where 


W .  .  =  w(z  .  )  =  w(  -z .  ) 

ill  i 


z.  =  (Y.  -  X.bU,)/sU) 

1  1  L 


and  is  the  estimate  of  o  at  the  iteration.  A  natural  form  for 

s  based  on  likelihood  criteria  is  given  by  equation  (2)  with 

( l ) 

substituted  for  W,  and  thus,  by  equation  (3),  with  b  substituted  for 


b  (W ) : 


S(WU-1),  . 


The  scalar  function  w(*)  in  (4)  is  a  nonnegative  and  nonincreasing  monotone 
function  and  thus  gives  relatively  smaller  weight  to  points  with  larger 

residuals,  e.g.,  w(z)  =  2/(1  +  z2). 

(4) 

with  a  specified  form  for  s  and  a  specified  form  for  the  function 

w(*),  IRLS  proceeds  by  choosing  a  starting  value  e.g.,  the  identity 

matrix,  and  then  calculating  b^1}  from  equations  (1)  and  (3),  s(1}  from 

equations  such  as  (2)  and  (6),  and  thence  from  equations  (4)  and  (5); 

from  w*1*,  the  next  iterates  b*2*,  s*2*  and  are  calculated;  the 

( \  T  (  i ) 

procedure  continues  indefinitely  unless  some  s  =  0  or  X  W  X  has  rank 


less  than  p.  Experience  suggests  that  for  many  choices  of  weight  functions, 
the  iterations  reliably  converge. 
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I.  Statistical  Justifications  for  IRLS 

A  general  statistical  justification  for  IRLS  for  robust  regression  arises 
from  the  fact  that  it  can  be  viewed  as  a  process  of  successive  substitution 
applied  to  the  equations  for  M-estimates  ( [1 ] , [2] , [8] , [9] ,  [10]).  Numerical 
behavior  of  IRLS  for  robust  regression  is  considered  in  [3],  [6],  [10],  [11]. 

A  more  specialized  justification  for  IRLS,  which  is  consistant  with 
statistical  principles  of  efficient  estimation,  arises  from  the  fact  that  some 
M-estimates  are  maximum  likelihood  estimates  under  special  distributional 
forms  for  the  conditional  distribution  of  Y  given  x.  When  M-estimates  are 
maximum  likelihood  estimates,  the  associated  IRLS  algorithm  is  an  EM-algorithm 
([4],  especially  pp.  19-20),  and  consequently,  general  convergence  results 
about  EM  algorithms  apply  to  IRLS  algorithms;  important  results  are  that  each 
step  of  IRLS  increases  the  likelihood  and,  under  weak  conditions,  IRLS 
converges  to  a  local  maximum  of  the  likelihood  function.  Details  of  the 
relationship  between  IRLS  and  EM,  including  general  results  on  large  and  small 
sample  rates  of  convergence,  are  given  in  [5]. 

5.  IRLS /EM  for  the  t-distribution 

A  specific  example  when  IRLS  is  EM  occurs  when  the  specification  for  the 
conditional  distribution  of  Y^  given  is  a  scaled  t-distribution  on  r 

degrees  of  freedom.  Then  the  associated  weight  function  for  IRLS  is 
w(z)  *  (r  +  1 )/(r  +  z  ),  and  the  large  sample  rate  of  convergence  for  IRLS 
is  3/(r  +  3).  More  generally,  if  d(z)  is  the  probability  density  function 
specified  for  the  conditional  distribution  of  Y^  given  X^,  then  the 
associated  weight  function  is  defined  by 

w(z)  ■>  -d'(z)/zd(z)  for  z  ^  0 

=  lim  -d'(z)/zd(z)  for  z  =»  0  . 

z*0 
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A  small  numerical  example  is  given  in  [5]  and  summarized  here.  Ten 
observations  were  drawn  from  a  t-distribution  on  3  degrees  of  freedom  (-0.141, 
0.678,  -0.036,  -0.350,  -5.005,  0.886,  0.485,  -4.154,  1.415,  1.546).  The 
results  of  twenty  steps  of  IRLS  starting  from  =  I  are  given  in  Table 

1.  The  empirical  rate  of  convergence  for  both  b^^  and  at  the  20th 

iteration  is  0.6805  which  agrees  well  with  the  theoretical  small  sample  rate 
of  convergence  of  0.6806  as  calculated  in  (51;  the  large  sample  rate  of 
convergence  is  0.5.  Since  the  rate  of  convergence  of  an  EM  algorithm  is 
proportional  to  the  fraction  of  information  in  the  observed  data  (i.e.,  in 
Y  and  X  in  the  robust  regression  context)  relative  to  the  information  in 
the  observed  and  missing  data  (i.e.,  in  V,  X  and  W),  we  see  that  in  this 
example  the  observed  data  have  relatively  more  information  about  8  and  o 
than  is  typical  for  samples  of  size  ten  from  a  t  on  three  degrees  of 
freedom.  Further  discussion  of  these  points  is  given  in  [51. 


Table  1. 


Successive 

iterations  of  IRLS 

for  example 

Iteration-^ 

e(*> 

o(4)2 

1 

-0.467496 

1.537750 

2 

0.103069 

1.673303 

3 

0.240781 

1.603189 

4 

0.277822 

1.524210 

5 

0.292411 

1 .466860 

6 

0. 300280 

1.427958 

7 

0. 305188 

1.401828 

8 

0. 308413 

1.384252 

9 

0.310571 

1.372393 

10 

0.312027 

1.364371 

1 1 

0.313012 

1.358934 

12 

0.313680 

1 . 355244 

13 

0.314133 

1.352738 

14 

0.314442 

1.351035 

15 

0.314651 

1.349876 

16 

0.314794 

1.349088 

17 

0.314890 

1.348552 

18 

0.314956 

1.348188 

19 

0.315001 

1.347939 

20 

0.315032 

1.347771 

6.  A  Multivariate  Extension 

A  potentially  quite  useful  and  simple  generalization  of  the  use  of 
IRLS/H4  for  the  t-distribution  has  apparently  not  yet  appeared  in  the 
literature  and  illustrates  the  flexibility  of  IRLS.  Suppose  Y^  is  q- 
variate  and  is  p-variate  as  before,  where  3  is  now  p  *  q,  and  let  the 

conditional  distribution  of  Y,-  -  Sx.  given  X,  he  a  zero-centered  linear 

i  L  1 

transformation  of  a  q-variate  spherically  symmetric  t-distribution  on  r 


-6- 


degrees  of  freedom.  Then  the  previous  notation  and  equations  apply  with  the 

2 

following  simple  modifications:  b(W)  defined  by  (1)  is  now  p  x  qf  s(W) 
defined  by  (2)  is  now  q  *  q,  the  weight  function  is  given  by 


where  at  the  l 


th 


w(z.)  =  (r  +q)/(r  +  z.) 

l  l 

iteration 

Z2  =  (Y.  -  X.  bU))[sU)2]-1(Y  -  X.  bU))T 

111  11 

(0) 


(7) 


(8) 


IRLS  begins  with  a  starting  value,  w'  ,  e.g.  the  identity  matrix, 

calculates  the  p  x  q  matrix  b^1^  from  equations  (1)  and  (3),  the  q  x  q 
(1)2 

matrix  s  from  equations  (2)  and  (8),  and  thence  the  n  *  n  diagonal 

matrix  from  equations  (4),  (7)  and  (8);  W*1^  leads  to  the  next 

i 2)  (2)2 

iterates  b  ,  s  ,  and  so  forth. 

Under  the  t-specif ication,  IRLS  is  EM  and  so  each  iteration  increases  the 

likelihood  of  the  p  x  q  location  parameter  3  and  the  q  x  q  scale 
2 

parameter  o  ,  and  under  weak  conditions,  the  iterations  will  converge  to 

2 

maximum  likelihood  estimates  of  6  and  o  .  IRLS  thus  provides  a  positive 
semi-definite  estimate  of  the  matrix  of  partial  correlations  among  the  q 
components  of  Y^  assuming  the  conditional  distribution  of  Y^  given  X^  is 
elliptically  symmetric  and  long  tailed  (if  r  is  chosen  to  be  small).  Some 


limited  experience  with  real  data  suggests  that  this  use  of  IRLS  does  yield 
estimates  of  correlation  matrices  rather  unaffected  by  extreme  observations. 


REFERENCES 


[1]  Andrews,  D.  G.,  Bickel,  P.  0.,  Hampel,  F.  R. ,  Huber,  P.  J., 

Rogers,  W.  H. ,  and  Tukey,  J.  W.  (1972).  Robust  Estimates  of 
Location;  Survey  and  Advances.  Princeton  University  Press. 

[2]  Beaton,  A.  E.  and  Tukey,  J.  W.  (1974).  The  fitting  of  power  series. 
Technometrics  16,  147-185. 

[3]  Byrd,  R.  H.  and  Pyne,  D.  A.  (1979).  ASA  Procedings  on  Statistical 
Computations,  68-71. 

[4]  Dempster,  A.  P.,  Laird,  N.  M.  and  Rubin,  D.  B.  (1977).  J.  Roy. 
Statist.  Soc.  Ser.  B#39,  1-38. 

[5]  Dempster,  A.  P.,  Laird,  N.  M.  and  Rubin,  D,  B.  (1980).  Multi var iate 
Analysis  -  V.  North-Holland,  35-57. 

[6]  Dutter,  R.  (1977).  J.  Statist.  Comput.  Simul.  5,  207-238. 

[7]  Eddy,  W.  and  Kadane,  J.  (1982).  J.  Amer.  Statist.  Assn. 

[8]  Holland,  P.  W.  and  Welsch,  R.  E.  (1977).  Commun.  Statist.  A#t>,  813- 

827. 

[9]  Huber,  P.  J.  (1964).  Ann.  Math.  Statist.  35,  73-101. 

(10]  Huber,  P.  J.  (1981).  Robust  Statistics.  Wiley. 

[11]  Klein,  R.  and  Yohai ,  V.  J.  (1981).  Commun.  Statist.  A10,  2373-2388. 


DBR/ j  vs 


security  classification  of  this  paoc  fii7i»n  n«i»  Fmni >d) 


REPORT  DOCUMENTATION  PAGE 


1.  REPORT  NUMBER 

#2328 


4.  TITLE  (and  Stiblltla) 

Iteratively  Reweighted  Least  Squares  -  Encyclopedia 
Entry 


kl:a!)  instructions 

BEFORE  COMPLETING  I  DEM 


RECIPIENT'S  CATALOG  NUMUER 


5.  TYPE  OF  REPORT  4  PERIOD  COVERED 

Summary  Report  -  no  specific 
reporting  period 


6.  PERFORMING  ORG.  REPORT  NUMBER 


7.  AUTHOR(»J 


8.  CONTRACT  OR  GRANT  NUMBERfsJ 


Donald  B.  Rubin 


9.  PERFORMING  ORGANIZATION  NAME  AND  ADDRESS 

Mathematics  Research  Center,  University  of 
610  Walnut  Street  Wisconsin 

Madison,  Wisconsin  53706 


11.  CONTROLLING  OFFICE  NAME  AND  ADDRESS 

U.  S.  Army  Research  Office 
P.O.  Box  12211 

Research  Triangle  Park,  North  Carolina  277  09 


4.  MONITORING  AGENCY  NAME  &  ADDRESS (If  different  from  Controlling  Office)  15.  SECURITY  CLASS,  (of  thla  report) 

UNCLASSIFIED 

15a.  DECL  ASSIFVcATION/ DOWNGRADING 

SCHEOULE 


16.  DISTRIBUTION  STATEMENT  (of  this  Report) 

Approved  for  public  release;  distribution  unlimited. 


DAAG29-80-C-0041 


to.  PROGRAM  ELEMENT.  PROJECT.  TASK 
AREA  8  WORK  UNIT  NUMBERS 

Work  Unit  Number  4  - 
Statistics  and  Probability 


12.  REPORT  DATE 

February  1982 


13.  NUMBER  OF  PAGES 
8 


17.  DISTRIBUTION  STATEMENT  (of  the  abatrect  entered  In  Block  20,  It  different  from  Report) 


19.  KEY  WORDS  (Continue  on  reverae  aide  if  neceaeery  and  Identify  by  block  number) 

Robust  Regression,  IRLS,  Em  Algorithm,  Maximum  Likelihood  Estimation, 
M-^Estimates,  Weighted  Regression,  t-distribution 


20.  asst  RACT  (Continue  on  reverae  aide  H  neceaeery  and  Identify  by  block  number) 

Iteratively  Reweighted  Least  Squares  (IRLS)  is  a  computationally  attractive 
method  for  providing  estimated  regression  coefficient  that  are  relatively 
unaffected  by  extreme  observations.  Definitions  and  statistical  justifications 
are  reviewed,  and  a  numerical  example  and  a  multivariate  extension  are  included. 
This  article  is  to  be  an  entry  in  The  Encyclopedia  of  Statistical  Sciences. 


1473  EDITION  OF  I  NOV  85  IS  OBSOLETE 


UNCLASSIFIED _ 

SECURITY  CLASSIFICATION  OF  THIS  PAGE  fUTien  D«f«  Enter ad) 


